Search CORE

97 research outputs found

Contributions of temporal encodings of voicing, voicelessness, fundamental frequency, and amplitude variation to audiovisual and auditory speech perception

Author: Andrew Faulkner
Drullman R.
Faulkner A.
Faulkner A.
Fourcin A.
Grant K. W.
Rosen S.
Rosen S.
Shinn P.
Stuart Rosen
Van Tasell D. J.
Waldstein R. S.
Publication venue: AMER INST PHYSICS
Publication date: 01/10/1999
Field of study

Auditory and audio-visual speech perception was investigated using auditory signals of invariant spectral envelope that temporally encoded the presence of voiced and voiceless excitation, variations in amplitude envelope and F-0. In experiment 1, the contribution of the timing of voicing was compared in consonant identification to the additional effects of variations in F-0 and the amplitude of voiced speech. In audio-visual conditions only, amplitude variation slightly increased accuracy globally and for manner features. F-0 variation slightly increased overall accuracy and manner perception in auditory and audio-visual conditions. Experiment 2 examined consonant information derived from the presence and amplitude variation of voiceless speech in addition to that from voicing, F-0, and voiced speech amplitude. Binary indication of voiceless excitation improved accuracy overall and for voicing and manner. The amplitude variation of voiceless speech produced only a small increment in place of articulation scores. A final experiment examined audio-visual sentence perception using encodings of voiceless excitation and amplitude variation added to a signal representing voicing and F-0. There was a contribution of amplitude variation to sentence perception, but not of voiceless excitation. The timing of voiced and voiceless excitation appears to be the major temporal cues to consonant identity. (C) 1999 Acoustical Society of America. [S0001-4966(99)01410-1]

Crossref

UCL Discovery

A mixed inventory structure for German concatenative synthesis

Author: AG Samuel
DH Whalen
E Moulines
G Heike
GE Peterson
K Kohler
K Kohler
K Küpfmüller
KA Keating
O Fujimura
R Carlson
R Drullman
T Portele
T Portele
T Portele
V Kraft
V Kraft
VJ Boucher
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

In speech synthesis by unit concatenation a major point is the definition of the unit inventory. Diphone or demisyllable inventories are widely used but both unit types have their drawbacks. This paper describes a mixed inventory structure which is syllable oriented but does not demand a definite decision about the position of a syllable boundary. In the definition process of the inventory the results of a comprehensive investigation of coarticulatory phenomena at syllable boundaries were used as well as a machine readable pronunciation dictionary. An evaluation comparing the mixed inventory with a demisyllable and a diphone inventory confirms that speech generated with the mixed inventory is superior regarding general acceptance. A segmental intelligibility test shows the high intelligibility of the synthetic speech

Crossref

Universaar

Acronym

Does training with amplitude modulated tones affect tone-vocoded speech perception?

Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored

Crossref

Loughborough University Institutional Repository

Directory of Open Access Journals

Lancaster E-Prints

Sussex Research Online

Tuning of Human Modulation Filters Is Carrier-Frequency Dependent

Author: AJ Oxenham
AJR Simpson
Andrew J. R. Simpson
BCJ Moore
C Humphries
CJ Plack
CS Watson
David McAlpine
DL Barbour
DS Brungart
E Ozimek
EM Zion Golumbic
FJ Gallun
GA Miller
GR Long
H Levitt
J Xiang
JA Garcia-Lazaro
JA Garcia-Lazaro
Jacob Engelmann
Joshua D. Reiss
ML Jepsen
N Ding
NF Viemeister
P Lakatos
R Drullman
RF Voss
RF Voss
RV Shannon
RW Peters
S Sadagopan
T Dau
T Dau
W Jesteadt
W McGill
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Licensed under the Creative Commons Attribution License

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Queen Mary Research Online

Macquarie University ResearchOnline

Contributions of temporal encodings of voicing, voicelessness, fundamental frequency, and amplitude variation to audio-visual and auditory speech perception

Author: Andrew Faulkner
Drullman R.
Faulkner A.
Faulkner A.
Fourcin A.
Grant K. W.
Rosen S.
Rosen S.
Shinn P.
Stuart Rosen
Van Tasell D. J.
Waldstein R. S.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date
Field of study

Crossref

Hemispheric Asymmetries in Speech Perception: Sense, Nonsense and Modulations

Author: A Boemio
A Poremba
AL Giraud
Andrew Whitehouse
CI Petkov
CT Best
D Poeppel
DA Hall
DD Greenwood
DJ Sharp
DRM Langers
Eleanor-Jayne Conway
FX Alario
HC Hart
J Bench
J Obleser
JD Warren
L Thivard
M Schonwiesner
MF Dorman
MP Harms
MR Petersen
P Belin
R Drullman
R Efron
R Remez
RD Patterson
RE Remez
Richard J. S. Wise
RJ Zatorre
RJS Wise
RV Shannon
S Rosen
Shabneet Chadha
SK Scott
SK Scott
Sophie K. Scott
Stuart Rosen
T Overath
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 30/09/2011
Field of study

Background: The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding 'rapid temporal processing'.Methodology: A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds.Conclusions: Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Biomimetic multi-resolution analysis for robust speaker recognition

Author: C Schreiner
D Garcia-Romero
D Garcia-Romero
D Zotkin
Dmitry N Zotkin
H Beigi
H Hermansky
H Hirsch
H Steeneken
H Versnel
J Woojay
JS Garofolo
K O’Connor
K Wang
L Miller
M Elhilali
Mounya Elhilali
P Kenny
P Loizou
Q Wu
R Auckenthaler
R Drullman
Ramani Duraiswami
S Greenberg
S Greenberg
Sridhar Krishna Nemala
T Arai
T Cover
T Elliott
T Kinnunen
X Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method

Author: A de Cheveigne
AS Bregman
D Talkin
DL Wang
G Hu
G Hu
G Hu
GJ Brown
J Barker
J Le Roux
J Tabrikian
JJ Sroka
L Atlas
M Buchler
M Wu
MH Radfar
MP Cooke
Q Li
R Drullman
RP Lippmann
S Dubnov
SM Schimmel
SM Schimmel
TW Lee
X Huang
Y Shao
Y Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Sensitivity of the human auditory cortex to acoustic degradation of speech and non-speech sounds

The perception of speech is usually an effortless and reliable process even in highly adverse listening conditions. In addition to external sound sources, the intelligibility of speech can be reduced by degradation of the structure of speech signal itself, for example by digital compression of sound. This kind of distortion may be even more detrimental to speech intelligibility than external distortion, given that the auditory system will not be able to utilize sound source-specific acoustic features, such as spatial location, to separate the distortion from the speech signal. The perceptual consequences of acoustic distortions on speech intelligibility have been extensively studied. However, the cortical mechanisms of speech perception in adverse listening conditions are not well known at present, particularly in situations where the speech signal itself is distorted. The aim of this thesis was to investigate the cortical mechanisms underlying speech perception in conditions where speech is less intelligible due to external distortion or as a result of digital compression. In the studies of this thesis, the intelligibility of speech was varied either by digital compression or addition of stochastic noise. Cortical activity related to the speech stimuli was measured using magnetoencephalography (MEG). The results indicated that degradation of speech sounds by digital compression enhanced the evoked responses originating from the auditory cortex, whereas addition of stochastic noise did not modulate the cortical responses. Furthermore, it was shown that if the distortion was presented continuously in the background, the transient activity of auditory cortex was delayed. On the perceptual level, digital compression reduced the comprehensibility of speech more than additive stochastic noise. In addition, it was also demonstrated that prior knowledge of speech content enhanced the intelligibility of distorted speech substantially, and this perceptual change was associated with an increase in cortical activity within several regions adjacent to auditory cortex. In conclusion, the results of this thesis show that the auditory cortex is very sensitive to the acoustic features of the distortion, while at later processing stages, several cortical areas reflect the intelligibility of speech. These findings suggest that the auditory system rapidly adapts to the variability of the auditory environment, and can efficiently utilize previous knowledge of speech content in deciphering acoustically degraded speech signals.Puheen havaitseminen on useimmiten vaivatonta ja luotettavaa myös erittäin huonoissa kuunteluolosuhteissa. Puheen ymmärrettävyys voi kuitenkin heikentyä ympäristön häiriölähteiden lisäksi myös silloin, kun puhesignaalin rakennetta muutetaan esimerkiksi pakkaamalla digitaalista ääntä. Tällainen häiriö voi heikentää ymmärrettävyyttä jopa ulkoisia häiriöitä voimakkaammin, koska kuulojärjestelmä ei pysty hyödyntämään äänilähteen ominaisuuksia, kuten äänen tulosuuntaa, häiriön erottelemisessa puheesta. Akustisten häiriöiden vaikutuksia puheen havaitsemiseen on tutkttu laajalti, mutta havaitsemiseen liittyvät aivomekanismit tunnetaan edelleen melko puutteelisesti etenkin tilanteissa, joissa itse puhesignaali on laadultaan heikentynyt. Tämän väitöskirjan tavoitteena oli tutkia puheen havaitsemisen aivomekanismeja tilanteissa, joissa puhesignaali on vaikeammin ymmärrettävissä joko ulkoisen äänilähteen tai digitaalisen pakkauksen vuoksi. Väitöskirjan neljässä osatutkimuksessa lyhyiden puheäänien ja jatkuvan puheen ymmärrettävyyttä muokattiin joko digitaalisen pakkauksen kautta tai lisäämällä puhesignaaliin satunnaiskohinaa. Puheärsykkeisiin liittyvää aivotoimintaa tutkittiin magnetoenkefalografia-mittauksilla. Tutkimuksissa havaittiin, että kuuloaivokuorella syntyneet herätevasteet voimistuivat, kun puheääntä pakattiin digitaalisesti. Sen sijaan puheääniin lisätty satunnaiskohina ei vaikuttanut herätevasteisiin. Edelleen, mikäli puheäänien taustalla esitettiin jatkuvaa häiriötä, kuuloaivokuoren aktivoituminen viivästyi häiriön intensiteetin kasvaessa. Kuuntelukokeissa havaittiin, että digitaalinen pakkaus heikentää puheäänien ymmärrettävyyttä voimakkaammin kuin satunnaiskohina. Lisäksi osoitettiin, että aiempi tieto puheen sisällöstä paransi merkittävästi häiriöisen puheen ymmärrettävyyttä, mikä heijastui aivotoimintaan kuuloaivokuoren viereisillä aivoalueilla siten, että ymmärrettävä puhe aiheutti suuremman aktivaation kuin heikosti ymmärrettävä puhe. Väitöskirjan tulokset osoittavat, että kuuloaivokuori on erittäin herkkä puheäänien akustisille häiriöille, ja myöhemmissä prosessoinnin vaiheissa useat kuuloaivokuoren viereiset aivoalueet heijastavat puheen ymmärrettävyyttä. Tulosten mukaan voi olettaa, että kuulojärjestelmä mukautuu nopeasti ääniympäristön vaihteluihin muun muassa hyödyntämällä aiempaa tietoa puheen sisällöstä tulkitessaan häiriöistä puhesignaalia

Crossref

Springer - Publisher Connector

PubMed Central

Aaltodoc Publication Archive

The Natural Statistics of Audiovisual Speech

Author: AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AL Giraud
Alice Caplier
Andrea Trubanova
Asif A. Ghazanfar
C Abry
C Chandrasekaran
C Kayser
C Rajkai
CE Schroeder
Chandramouli Chandrasekaran
CR Lansing
D Poeppel
D Sodoyer
D Sodoyer
E Ahissar
E Vatikiotis-Bateson
EP Simoncelli
G Buzsaki
G Monaci
GS Pollack
H Barlow
H Luo
H McGurk
H Yehia
HC Yehia
IJ Hirsh
J Kim
J Ohala
J Westbury
JS Garofolo
JX Maier
JX Maier
K Munhall
K Munhall
K Munhall
K Saberi
K von Kriegstein
K von Kriegstein
Karl J. Friston
KG Munhall
KG Munhall
KMG Fu
KW Grant
L Smith
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
M Cooke
M Kamachi
M Lungarella
M Sams
M Vitkovitch
M Vitkovitch
MR Jarvis
N Eveno
NC Singh
NF Dixon
P Cosi
P Lakatos
P Lakatos
P Lieberman
P Suppes
PP Mitra
Q Summerfield
Q Summerfield
R Campbell
R Drullman
R Drullman
R Pfeifer
RT Canolty
RV Shannon
S Greenberg
S Stillittano
SJ Kiebel
Sébastien Stillittano
T Lallouache
U Werner-Reiss
V van Wassenhove
V van Wassenhove
ZM Smith
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver

Public Library of Science (PLOS)

Princeton University Open Access Repository

Crossref

Hal - Université Grenoble Alpes

Directory of Open Access Journals

PubMed Central